Overview

Dataset statistics

 Dataset ADataset B
Number of variables1212
Number of observations446446
Missing cells416428
Missing cells (%)7.8%8.0%
Duplicate rows00
Duplicate rows (%)0.0%0.0%
Total size in memory45.3 KiB45.3 KiB
Average record size in memory104.0 B104.0 B

Variable types

 Dataset ADataset B
Numeric55
Categorical77

Alerts

Dataset ADataset B
Name has a high cardinality: 446 distinct values Name has a high cardinality: 446 distinct values High Cardinality
Ticket has a high cardinality: 379 distinct values Ticket has a high cardinality: 376 distinct values High Cardinality
Cabin has a high cardinality: 98 distinct values Cabin has a high cardinality: 81 distinct values High Cardinality
Fare is highly overall correlated with PclassFare is highly overall correlated with PclassHigh Correlation
Survived is highly overall correlated with SexSurvived is highly overall correlated with SexHigh Correlation
Pclass is highly overall correlated with FarePclass is highly overall correlated with FareHigh Correlation
Sex is highly overall correlated with SurvivedSex is highly overall correlated with SurvivedHigh Correlation
Age has 82 (18.4%) missing values Age has 82 (18.4%) missing values Missing
Cabin has 333 (74.7%) missing values Cabin has 345 (77.4%) missing values Missing
Name is uniformly distributed Name is uniformly distributed Uniform
Ticket is uniformly distributed Ticket is uniformly distributed Uniform
Cabin is uniformly distributed Cabin is uniformly distributed Uniform
PassengerId has unique values PassengerId has unique values Unique
Name has unique values Name has unique values Unique
SibSp has 295 (66.1%) zeros SibSp has 294 (65.9%) zeros Zeros
Parch has 336 (75.3%) zeros Parch has 333 (74.7%) zeros Zeros
Fare has 6 (1.3%) zeros Fare has 10 (2.2%) zeros Zeros

Reproduction

 Dataset ADataset B
Analysis started2023-01-30 17:09:49.3744742023-01-30 17:09:54.506919
Analysis finished2023-01-30 17:09:54.5035902023-01-30 17:09:58.545820
Duration5.13 seconds4.04 seconds
Software versionydata-profiling v0.0.dev0ydata-profiling v0.0.dev0
Download configurationconfig.jsonconfig.json

Variables

PassengerId
Real number (ℝ)

 Dataset ADataset B
Distinct446446
Distinct (%)100.0%100.0%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean456.78924459.75112
 Dataset ADataset B
Minimum11
Maximum890889
Zeros00
Zeros (%)0.0%0.0%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2023-01-30T17:09:58.663357image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum11
5-th percentile38.550.25
Q1231.75249.25
median460467.5
Q3685.25671.75
95-th percentile854.25853
Maximum890889
Range889888
Interquartile range (IQR)453.5422.5

Descriptive statistics

 Dataset ADataset B
Standard deviation261.11357255.6025
Coefficient of variation (CV)0.571628120.5559584
Kurtosis-1.1928291-1.1677702
Mean456.78924459.75112
Median Absolute Deviation (MAD)226.5215
Skewness-0.055220217-0.066689484
Sum203728205049
Variance68180.29765332.637
MonotonicityNot monotonicNot monotonic
2023-01-30T17:09:58.860849image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
428 1
 
0.2%
748 1
 
0.2%
790 1
 
0.2%
91 1
 
0.2%
556 1
 
0.2%
342 1
 
0.2%
862 1
 
0.2%
258 1
 
0.2%
228 1
 
0.2%
495 1
 
0.2%
Other values (436) 436
97.8%
ValueCountFrequency (%)
676 1
 
0.2%
645 1
 
0.2%
62 1
 
0.2%
854 1
 
0.2%
357 1
 
0.2%
319 1
 
0.2%
358 1
 
0.2%
889 1
 
0.2%
97 1
 
0.2%
706 1
 
0.2%
Other values (436) 436
97.8%
ValueCountFrequency (%)
1 1
0.2%
2 1
0.2%
3 1
0.2%
5 1
0.2%
7 1
0.2%
8 1
0.2%
9 1
0.2%
10 1
0.2%
13 1
0.2%
15 1
0.2%
ValueCountFrequency (%)
1 1
0.2%
3 1
0.2%
4 1
0.2%
5 1
0.2%
6 1
0.2%
15 1
0.2%
16 1
0.2%
19 1
0.2%
21 1
0.2%
25 1
0.2%
ValueCountFrequency (%)
1 1
0.2%
3 1
0.2%
4 1
0.2%
5 1
0.2%
6 1
0.2%
15 1
0.2%
16 1
0.2%
19 1
0.2%
21 1
0.2%
25 1
0.2%
ValueCountFrequency (%)
1 1
0.2%
2 1
0.2%
3 1
0.2%
5 1
0.2%
7 1
0.2%
8 1
0.2%
9 1
0.2%
10 1
0.2%
13 1
0.2%
15 1
0.2%

Survived
Categorical

 Dataset ADataset B
Distinct22
Distinct (%)0.4%0.4%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
0
277 
1
169 
0
282 
1
164 

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters446446
Distinct characters22
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st row10
2nd row01
3rd row00
4th row10
5th row00

Common Values

ValueCountFrequency (%)
0 277
62.1%
1 169
37.9%
ValueCountFrequency (%)
0 282
63.2%
1 164
36.8%

Length

2023-01-30T17:09:59.011990image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2023-01-30T17:09:59.139619image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset B

2023-01-30T17:09:59.248064image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
ValueCountFrequency (%)
0 277
62.1%
1 169
37.9%
ValueCountFrequency (%)
0 282
63.2%
1 164
36.8%

Most occurring characters

ValueCountFrequency (%)
0 277
62.1%
1 169
37.9%
ValueCountFrequency (%)
0 282
63.2%
1 164
36.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 446
100.0%
ValueCountFrequency (%)
Decimal Number 446
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 277
62.1%
1 169
37.9%
ValueCountFrequency (%)
0 282
63.2%
1 164
36.8%

Most occurring scripts

ValueCountFrequency (%)
Common 446
100.0%
ValueCountFrequency (%)
Common 446
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 277
62.1%
1 169
37.9%
ValueCountFrequency (%)
0 282
63.2%
1 164
36.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 446
100.0%
ValueCountFrequency (%)
ASCII 446
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 277
62.1%
1 169
37.9%
ValueCountFrequency (%)
0 282
63.2%
1 164
36.8%

Pclass
Categorical

 Dataset ADataset B
Distinct33
Distinct (%)0.7%0.7%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
3
238 
1
116 
2
92 
3
251 
1
108 
2
87 

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters446446
Distinct characters33
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st row23
2nd row23
3rd row33
4th row32
5th row31

Common Values

ValueCountFrequency (%)
3 238
53.4%
1 116
26.0%
2 92
 
20.6%
ValueCountFrequency (%)
3 251
56.3%
1 108
24.2%
2 87
 
19.5%

Length

2023-01-30T17:09:59.338622image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2023-01-30T17:09:59.456752image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset B

2023-01-30T17:09:59.578940image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
ValueCountFrequency (%)
3 238
53.4%
1 116
26.0%
2 92
 
20.6%
ValueCountFrequency (%)
3 251
56.3%
1 108
24.2%
2 87
 
19.5%

Most occurring characters

ValueCountFrequency (%)
3 238
53.4%
1 116
26.0%
2 92
 
20.6%
ValueCountFrequency (%)
3 251
56.3%
1 108
24.2%
2 87
 
19.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 446
100.0%
ValueCountFrequency (%)
Decimal Number 446
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
3 238
53.4%
1 116
26.0%
2 92
 
20.6%
ValueCountFrequency (%)
3 251
56.3%
1 108
24.2%
2 87
 
19.5%

Most occurring scripts

ValueCountFrequency (%)
Common 446
100.0%
ValueCountFrequency (%)
Common 446
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
3 238
53.4%
1 116
26.0%
2 92
 
20.6%
ValueCountFrequency (%)
3 251
56.3%
1 108
24.2%
2 87
 
19.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 446
100.0%
ValueCountFrequency (%)
ASCII 446
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3 238
53.4%
1 116
26.0%
2 92
 
20.6%
ValueCountFrequency (%)
3 251
56.3%
1 108
24.2%
2 87
 
19.5%

Name
Categorical

 Dataset ADataset B
Distinct446446
Distinct (%)100.0%100.0%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
Phillips, Miss. Kate Florence ("Mrs Kate Louise Phillips Marshall")
 
1
Sinkkonen, Miss. Anna
 
1
Guggenheim, Mr. Benjamin
 
1
Christmann, Mr. Emil
 
1
Wright, Mr. George
 
1
Other values (441)
441 
Edvardsson, Mr. Gustaf Hjalmar
 
1
Baclini, Miss. Eugenie
 
1
Icard, Miss. Amelie
 
1
Lines, Miss. Mary Conover
 
1
Bowerman, Miss. Elsie Edith
 
1
Other values (441)
441 

Length

 Dataset ADataset B
Max length8267
Median length49.549
Mean length27.32959627.159193
Min length1313

Characters and Unicode

 Dataset ADataset B
Total characters1218912113
Distinct characters6059
Distinct categories77 ?
Distinct scripts22 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique446446 ?
Unique (%)100.0%100.0%

Sample

 Dataset ADataset B
1st rowPhillips, Miss. Kate Florence ("Mrs Kate Louise Phillips Marshall")Edvardsson, Mr. Gustaf Hjalmar
2nd rowGill, Mr. John WilliamLandergren, Miss. Aurora Adelia
3rd rowHakkarainen, Mr. Pekka PietariStrom, Mrs. Wilhelm (Elna Matilda Persson)
4th rowHeikkinen, Miss. LainaHarris, Mr. Walter
5th rowCribb, Mr. John HatfieldAndrews, Mr. Thomas Jr

Common Values

ValueCountFrequency (%)
Phillips, Miss. Kate Florence ("Mrs Kate Louise Phillips Marshall") 1
 
0.2%
Sinkkonen, Miss. Anna 1
 
0.2%
Guggenheim, Mr. Benjamin 1
 
0.2%
Christmann, Mr. Emil 1
 
0.2%
Wright, Mr. George 1
 
0.2%
Fortune, Miss. Alice Elizabeth 1
 
0.2%
Giles, Mr. Frederick Edward 1
 
0.2%
Cherry, Miss. Gladys 1
 
0.2%
Lovell, Mr. John Hall ("Henry") 1
 
0.2%
Stanley, Mr. Edward Roland 1
 
0.2%
Other values (436) 436
97.8%
ValueCountFrequency (%)
Edvardsson, Mr. Gustaf Hjalmar 1
 
0.2%
Baclini, Miss. Eugenie 1
 
0.2%
Icard, Miss. Amelie 1
 
0.2%
Lines, Miss. Mary Conover 1
 
0.2%
Bowerman, Miss. Elsie Edith 1
 
0.2%
Wick, Miss. Mary Natalie 1
 
0.2%
Funk, Miss. Annie Clemmer 1
 
0.2%
Johnston, Miss. Catherine Helen "Carrie" 1
 
0.2%
Goldschmidt, Mr. George B 1
 
0.2%
Morley, Mr. Henry Samuel ("Mr Henry Marshall") 1
 
0.2%
Other values (436) 436
97.8%

Length

2023-01-30T17:09:59.749099image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A


Number of variable categories passes threshold (config.plot.cat_freq.max_unique)

Dataset B


Number of variable categories passes threshold (config.plot.cat_freq.max_unique)
ValueCountFrequency (%)
mr 260
 
14.2%
miss 97
 
5.3%
mrs 64
 
3.5%
william 29
 
1.6%
john 24
 
1.3%
master 20
 
1.1%
henry 18
 
1.0%
charles 14
 
0.8%
george 14
 
0.8%
thomas 12
 
0.7%
Other values (896) 1281
69.9%
ValueCountFrequency (%)
mr 257
 
14.1%
miss 91
 
5.0%
mrs 71
 
3.9%
william 35
 
1.9%
john 22
 
1.2%
master 20
 
1.1%
henry 16
 
0.9%
thomas 13
 
0.7%
george 13
 
0.7%
edward 12
 
0.7%
Other values (872) 1279
69.9%

Most occurring characters

ValueCountFrequency (%)
1389
 
11.4%
r 1000
 
8.2%
e 880
 
7.2%
a 827
 
6.8%
n 688
 
5.6%
i 676
 
5.5%
s 637
 
5.2%
M 577
 
4.7%
l 551
 
4.5%
o 519
 
4.3%
Other values (50) 4445
36.5%
ValueCountFrequency (%)
1384
 
11.4%
r 994
 
8.2%
a 842
 
7.0%
e 839
 
6.9%
i 692
 
5.7%
s 655
 
5.4%
n 649
 
5.4%
M 571
 
4.7%
l 538
 
4.4%
o 512
 
4.2%
Other values (49) 4437
36.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 7856
64.5%
Uppercase Letter 1841
 
15.1%
Space Separator 1389
 
11.4%
Other Punctuation 954
 
7.8%
Close Punctuation 71
 
0.6%
Open Punctuation 71
 
0.6%
Dash Punctuation 7
 
0.1%
ValueCountFrequency (%)
Lowercase Letter 7785
64.3%
Uppercase Letter 1843
 
15.2%
Space Separator 1384
 
11.4%
Other Punctuation 942
 
7.8%
Close Punctuation 77
 
0.6%
Open Punctuation 77
 
0.6%
Dash Punctuation 5
 
< 0.1%

Most frequent character per category

Space Separator
ValueCountFrequency (%)
1389
100.0%
ValueCountFrequency (%)
1384
100.0%
Lowercase Letter
ValueCountFrequency (%)
r 1000
12.7%
e 880
11.2%
a 827
10.5%
n 688
8.8%
i 676
8.6%
s 637
8.1%
l 551
 
7.0%
o 519
 
6.6%
t 327
 
4.2%
h 282
 
3.6%
Other values (16) 1469
18.7%
ValueCountFrequency (%)
r 994
12.8%
a 842
10.8%
e 839
10.8%
i 692
8.9%
s 655
8.4%
n 649
8.3%
l 538
 
6.9%
o 512
 
6.6%
t 323
 
4.1%
h 260
 
3.3%
Other values (16) 1481
19.0%
Uppercase Letter
ValueCountFrequency (%)
M 577
31.3%
A 127
 
6.9%
J 114
 
6.2%
S 90
 
4.9%
C 88
 
4.8%
H 88
 
4.8%
E 87
 
4.7%
B 71
 
3.9%
W 71
 
3.9%
L 64
 
3.5%
Other values (15) 464
25.2%
ValueCountFrequency (%)
M 571
31.0%
A 124
 
6.7%
J 111
 
6.0%
H 104
 
5.6%
E 92
 
5.0%
C 87
 
4.7%
S 79
 
4.3%
W 77
 
4.2%
B 72
 
3.9%
R 61
 
3.3%
Other values (15) 465
25.2%
Other Punctuation
ValueCountFrequency (%)
. 447
46.9%
, 446
46.8%
" 54
 
5.7%
' 6
 
0.6%
/ 1
 
0.1%
ValueCountFrequency (%)
, 446
47.3%
. 446
47.3%
" 46
 
4.9%
' 4
 
0.4%
Close Punctuation
ValueCountFrequency (%)
) 71
100.0%
ValueCountFrequency (%)
) 77
100.0%
Open Punctuation
ValueCountFrequency (%)
( 71
100.0%
ValueCountFrequency (%)
( 77
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 7
100.0%
ValueCountFrequency (%)
- 5
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 9697
79.6%
Common 2492
 
20.4%
ValueCountFrequency (%)
Latin 9628
79.5%
Common 2485
 
20.5%

Most frequent character per script

Common
ValueCountFrequency (%)
1389
55.7%
. 447
 
17.9%
, 446
 
17.9%
) 71
 
2.8%
( 71
 
2.8%
" 54
 
2.2%
- 7
 
0.3%
' 6
 
0.2%
/ 1
 
< 0.1%
ValueCountFrequency (%)
1384
55.7%
, 446
 
17.9%
. 446
 
17.9%
) 77
 
3.1%
( 77
 
3.1%
" 46
 
1.9%
- 5
 
0.2%
' 4
 
0.2%
Latin
ValueCountFrequency (%)
r 1000
 
10.3%
e 880
 
9.1%
a 827
 
8.5%
n 688
 
7.1%
i 676
 
7.0%
s 637
 
6.6%
M 577
 
6.0%
l 551
 
5.7%
o 519
 
5.4%
t 327
 
3.4%
Other values (41) 3015
31.1%
ValueCountFrequency (%)
r 994
 
10.3%
a 842
 
8.7%
e 839
 
8.7%
i 692
 
7.2%
s 655
 
6.8%
n 649
 
6.7%
M 571
 
5.9%
l 538
 
5.6%
o 512
 
5.3%
t 323
 
3.4%
Other values (41) 3013
31.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 12189
100.0%
ValueCountFrequency (%)
ASCII 12113
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1389
 
11.4%
r 1000
 
8.2%
e 880
 
7.2%
a 827
 
6.8%
n 688
 
5.6%
i 676
 
5.5%
s 637
 
5.2%
M 577
 
4.7%
l 551
 
4.5%
o 519
 
4.3%
Other values (50) 4445
36.5%
ValueCountFrequency (%)
1384
 
11.4%
r 994
 
8.2%
a 842
 
7.0%
e 839
 
6.9%
i 692
 
5.7%
s 655
 
5.4%
n 649
 
5.4%
M 571
 
4.7%
l 538
 
4.4%
o 512
 
4.2%
Other values (49) 4437
36.6%

Sex
Categorical

 Dataset ADataset B
Distinct22
Distinct (%)0.4%0.4%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
male
286 
female
160 
male
285 
female
161 

Length

 Dataset ADataset B
Max length66
Median length44
Mean length4.71748884.7219731
Min length44

Characters and Unicode

 Dataset ADataset B
Total characters21042106
Distinct characters55
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st rowfemalemale
2nd rowmalefemale
3rd rowmalefemale
4th rowfemalemale
5th rowmalemale

Common Values

ValueCountFrequency (%)
male 286
64.1%
female 160
35.9%
ValueCountFrequency (%)
male 285
63.9%
female 161
36.1%

Length

2023-01-30T17:09:59.911072image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2023-01-30T17:10:00.043995image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset B

2023-01-30T17:10:00.148572image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
ValueCountFrequency (%)
male 286
64.1%
female 160
35.9%
ValueCountFrequency (%)
male 285
63.9%
female 161
36.1%

Most occurring characters

ValueCountFrequency (%)
e 606
28.8%
m 446
21.2%
a 446
21.2%
l 446
21.2%
f 160
 
7.6%
ValueCountFrequency (%)
e 607
28.8%
m 446
21.2%
a 446
21.2%
l 446
21.2%
f 161
 
7.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 2104
100.0%
ValueCountFrequency (%)
Lowercase Letter 2106
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 606
28.8%
m 446
21.2%
a 446
21.2%
l 446
21.2%
f 160
 
7.6%
ValueCountFrequency (%)
e 607
28.8%
m 446
21.2%
a 446
21.2%
l 446
21.2%
f 161
 
7.6%

Most occurring scripts

ValueCountFrequency (%)
Latin 2104
100.0%
ValueCountFrequency (%)
Latin 2106
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 606
28.8%
m 446
21.2%
a 446
21.2%
l 446
21.2%
f 160
 
7.6%
ValueCountFrequency (%)
e 607
28.8%
m 446
21.2%
a 446
21.2%
l 446
21.2%
f 161
 
7.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2104
100.0%
ValueCountFrequency (%)
ASCII 2106
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 606
28.8%
m 446
21.2%
a 446
21.2%
l 446
21.2%
f 160
 
7.6%
ValueCountFrequency (%)
e 607
28.8%
m 446
21.2%
a 446
21.2%
l 446
21.2%
f 161
 
7.6%

Age
Real number (ℝ)

 Dataset ADataset B
Distinct7673
Distinct (%)20.9%20.1%
Missing8282
Missing (%)18.4%18.4%
Infinite00
Infinite (%)0.0%0.0%
Mean29.8878329.081264
 Dataset ADataset B
Minimum0.420.75
Maximum8071
Zeros00
Zeros (%)0.0%0.0%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2023-01-30T17:10:00.306010image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum0.420.75
5-th percentile4.155.15
Q12019.75
median2827.5
Q33938
95-th percentile59.754.85
Maximum8071
Range79.5870.25
Interquartile range (IQR)1918.25

Descriptive statistics

 Dataset ADataset B
Standard deviation15.11966514.321352
Coefficient of variation (CV)0.505880320.49245975
Kurtosis0.310511060.029531589
Mean29.8878329.081264
Median Absolute Deviation (MAD)98.5
Skewness0.486542290.4206788
Sum10879.1710585.58
Variance228.60426205.10112
MonotonicityNot monotonicNot monotonic
2023-01-30T17:10:00.626815image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
19 18
 
4.0%
30 15
 
3.4%
28 14
 
3.1%
18 14
 
3.1%
22 12
 
2.7%
24 12
 
2.7%
21 12
 
2.7%
31 11
 
2.5%
39 11
 
2.5%
36 11
 
2.5%
Other values (66) 234
52.5%
(Missing) 82
 
18.4%
ValueCountFrequency (%)
19 16
 
3.6%
25 15
 
3.4%
22 15
 
3.4%
18 14
 
3.1%
24 14
 
3.1%
21 13
 
2.9%
32 12
 
2.7%
30 11
 
2.5%
16 11
 
2.5%
28 11
 
2.5%
Other values (63) 232
52.0%
(Missing) 82
 
18.4%
ValueCountFrequency (%)
0.42 1
 
0.2%
0.67 1
 
0.2%
0.75 1
 
0.2%
0.83 1
 
0.2%
1 3
0.7%
2 6
1.3%
3 1
 
0.2%
4 5
1.1%
5 3
0.7%
6 3
0.7%
ValueCountFrequency (%)
0.75 1
 
0.2%
0.83 1
 
0.2%
1 5
1.1%
2 3
0.7%
3 3
0.7%
4 5
1.1%
5 1
 
0.2%
6 3
0.7%
7 3
0.7%
8 2
 
0.4%
ValueCountFrequency (%)
0.75 1
 
0.2%
0.83 1
 
0.2%
1 5
1.1%
2 3
0.7%
3 3
0.7%
4 5
1.1%
5 1
 
0.2%
6 3
0.7%
7 3
0.7%
8 2
 
0.4%
ValueCountFrequency (%)
0.42 1
 
0.2%
0.67 1
 
0.2%
0.75 1
 
0.2%
0.83 1
 
0.2%
1 3
0.7%
2 6
1.3%
3 1
 
0.2%
4 5
1.1%
5 3
0.7%
6 3
0.7%

SibSp
Real number (ℝ)

 Dataset ADataset B
Distinct77
Distinct (%)1.6%1.6%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean0.544843050.54932735
 Dataset ADataset B
Minimum00
Maximum88
Zeros295294
Zeros (%)66.1%65.9%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2023-01-30T17:10:00.778968image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum00
5-th percentile00
Q100
median00
Q311
95-th percentile32
Maximum88
Range88
Interquartile range (IQR)11

Descriptive statistics

 Dataset ADataset B
Standard deviation1.08989551.1242124
Coefficient of variation (CV)2.00038442.0465254
Kurtosis16.40574518.251242
Mean0.544843050.54932735
Median Absolute Deviation (MAD)00
Skewness3.49906173.7370127
Sum243245
Variance1.18787221.2638535
MonotonicityNot monotonicNot monotonic
2023-01-30T17:10:00.882680image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
0 295
66.1%
1 114
 
25.6%
2 12
 
2.7%
4 11
 
2.5%
3 9
 
2.0%
8 3
 
0.7%
5 2
 
0.4%
ValueCountFrequency (%)
0 294
65.9%
1 115
 
25.8%
2 16
 
3.6%
4 11
 
2.5%
8 4
 
0.9%
3 4
 
0.9%
5 2
 
0.4%
ValueCountFrequency (%)
0 295
66.1%
1 114
 
25.6%
2 12
 
2.7%
3 9
 
2.0%
4 11
 
2.5%
5 2
 
0.4%
8 3
 
0.7%
ValueCountFrequency (%)
0 294
65.9%
1 115
 
25.8%
2 16
 
3.6%
3 4
 
0.9%
4 11
 
2.5%
5 2
 
0.4%
8 4
 
0.9%
ValueCountFrequency (%)
0 294
65.9%
1 115
 
25.8%
2 16
 
3.6%
3 4
 
0.9%
4 11
 
2.5%
5 2
 
0.4%
8 4
 
0.9%
ValueCountFrequency (%)
0 295
66.1%
1 114
 
25.6%
2 12
 
2.7%
3 9
 
2.0%
4 11
 
2.5%
5 2
 
0.4%
8 3
 
0.7%

Parch
Real number (ℝ)

 Dataset ADataset B
Distinct66
Distinct (%)1.3%1.3%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean0.401345290.39686099
 Dataset ADataset B
Minimum00
Maximum55
Zeros336333
Zeros (%)75.3%74.7%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2023-01-30T17:10:00.993529image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum00
5-th percentile00
Q100
median00
Q301
95-th percentile22
Maximum55
Range55
Interquartile range (IQR)01

Descriptive statistics

 Dataset ADataset B
Standard deviation0.830810070.80274194
Coefficient of variation (CV)2.07006312.0227283
Kurtosis7.99862478.3339514
Mean0.401345290.39686099
Median Absolute Deviation (MAD)00
Skewness2.58352742.564672
Sum179177
Variance0.690245380.64439462
MonotonicityNot monotonicNot monotonic
2023-01-30T17:10:01.095481image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
0 336
75.3%
1 60
 
13.5%
2 40
 
9.0%
3 4
 
0.9%
5 3
 
0.7%
4 3
 
0.7%
ValueCountFrequency (%)
0 333
74.7%
1 64
 
14.3%
2 42
 
9.4%
5 3
 
0.7%
4 2
 
0.4%
3 2
 
0.4%
ValueCountFrequency (%)
0 336
75.3%
1 60
 
13.5%
2 40
 
9.0%
3 4
 
0.9%
4 3
 
0.7%
5 3
 
0.7%
ValueCountFrequency (%)
0 333
74.7%
1 64
 
14.3%
2 42
 
9.4%
3 2
 
0.4%
4 2
 
0.4%
5 3
 
0.7%
ValueCountFrequency (%)
0 333
74.7%
1 64
 
14.3%
2 42
 
9.4%
3 2
 
0.4%
4 2
 
0.4%
5 3
 
0.7%
ValueCountFrequency (%)
0 336
75.3%
1 60
 
13.5%
2 40
 
9.0%
3 4
 
0.9%
4 3
 
0.7%
5 3
 
0.7%

Ticket
Categorical

 Dataset ADataset B
Distinct379376
Distinct (%)85.0%84.3%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
347082
 
6
19950
 
4
3101295
 
4
W./C. 6608
 
4
13502
 
3
Other values (374)
425 
347082
 
5
3101295
 
4
1601
 
4
CA. 2343
 
4
LINE
 
3
Other values (371)
426 

Length

 Dataset ADataset B
Max length1818
Median length1717
Mean length6.68609876.7511211
Min length33

Characters and Unicode

 Dataset ADataset B
Total characters29823011
Distinct characters3232
Distinct categories55 ?
Distinct scripts22 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique330322 ?
Unique (%)74.0%72.2%

Sample

 Dataset ADataset B
1st row250655349912
2nd row233866C 7077
3rd rowSTON/O2. 3101279347054
4th rowSTON/O2. 3101282W/C 14208
5th row371362112050

Common Values

ValueCountFrequency (%)
347082 6
 
1.3%
19950 4
 
0.9%
3101295 4
 
0.9%
W./C. 6608 4
 
0.9%
13502 3
 
0.7%
347742 3
 
0.7%
347088 3
 
0.7%
SC/Paris 2123 3
 
0.7%
363291 3
 
0.7%
CA. 2343 3
 
0.7%
Other values (369) 410
91.9%
ValueCountFrequency (%)
347082 5
 
1.1%
3101295 4
 
0.9%
1601 4
 
0.9%
CA. 2343 4
 
0.9%
LINE 3
 
0.7%
W./C. 6608 3
 
0.7%
110413 3
 
0.7%
382652 3
 
0.7%
17421 3
 
0.7%
345773 3
 
0.7%
Other values (366) 411
92.2%

Length

2023-01-30T17:10:01.237170image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A


Number of variable categories passes threshold (config.plot.cat_freq.max_unique)

Dataset B


Number of variable categories passes threshold (config.plot.cat_freq.max_unique)
ValueCountFrequency (%)
pc 27
 
4.9%
c.a 15
 
2.7%
a/5 10
 
1.8%
sc/paris 7
 
1.3%
347082 6
 
1.1%
ca 6
 
1.1%
w./c 5
 
0.9%
3101295 4
 
0.7%
6608 4
 
0.7%
soton/oq 4
 
0.7%
Other values (394) 468
84.2%
ValueCountFrequency (%)
pc 25
 
4.4%
c.a 13
 
2.3%
a/5 9
 
1.6%
ston/o 7
 
1.2%
2 7
 
1.2%
ca 6
 
1.1%
347082 5
 
0.9%
sc/paris 5
 
0.9%
w./c 5
 
0.9%
soton/o.q 4
 
0.7%
Other values (395) 476
84.7%

Most occurring characters

ValueCountFrequency (%)
3 373
12.5%
1 345
11.6%
2 289
9.7%
7 243
8.1%
4 237
7.9%
0 206
 
6.9%
5 203
 
6.8%
6 201
 
6.7%
9 172
 
5.8%
8 141
 
4.7%
Other values (22) 572
19.2%
ValueCountFrequency (%)
3 380
12.6%
1 340
11.3%
2 297
9.9%
7 259
8.6%
4 240
8.0%
6 210
 
7.0%
0 200
 
6.6%
5 179
 
5.9%
9 169
 
5.6%
8 138
 
4.6%
Other values (22) 599
19.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 2410
80.8%
Uppercase Letter 305
 
10.2%
Other Punctuation 141
 
4.7%
Space Separator 110
 
3.7%
Lowercase Letter 16
 
0.5%
ValueCountFrequency (%)
Decimal Number 2412
80.1%
Uppercase Letter 329
 
10.9%
Other Punctuation 146
 
4.8%
Space Separator 116
 
3.9%
Lowercase Letter 8
 
0.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
3 373
15.5%
1 345
14.3%
2 289
12.0%
7 243
10.1%
4 237
9.8%
0 206
8.5%
5 203
8.4%
6 201
8.3%
9 172
7.1%
8 141
 
5.9%
ValueCountFrequency (%)
3 380
15.8%
1 340
14.1%
2 297
12.3%
7 259
10.7%
4 240
10.0%
6 210
8.7%
0 200
8.3%
5 179
7.4%
9 169
7.0%
8 138
 
5.7%
Space Separator
ValueCountFrequency (%)
110
100.0%
ValueCountFrequency (%)
116
100.0%
Other Punctuation
ValueCountFrequency (%)
. 92
65.2%
/ 49
34.8%
ValueCountFrequency (%)
. 92
63.0%
/ 54
37.0%
Uppercase Letter
ValueCountFrequency (%)
C 73
23.9%
P 49
16.1%
A 43
14.1%
O 40
13.1%
S 35
11.5%
N 16
 
5.2%
T 14
 
4.6%
Q 7
 
2.3%
W 7
 
2.3%
I 7
 
2.3%
Other values (5) 14
 
4.6%
ValueCountFrequency (%)
C 71
21.6%
O 53
16.1%
P 42
12.8%
A 41
12.5%
S 38
11.6%
N 23
 
7.0%
T 20
 
6.1%
W 10
 
3.0%
I 7
 
2.1%
Q 7
 
2.1%
Other values (5) 17
 
5.2%
Lowercase Letter
ValueCountFrequency (%)
a 4
25.0%
r 4
25.0%
i 4
25.0%
s 4
25.0%
ValueCountFrequency (%)
a 2
25.0%
r 2
25.0%
i 2
25.0%
s 2
25.0%

Most occurring scripts

ValueCountFrequency (%)
Common 2661
89.2%
Latin 321
 
10.8%
ValueCountFrequency (%)
Common 2674
88.8%
Latin 337
 
11.2%

Most frequent character per script

Common
ValueCountFrequency (%)
3 373
14.0%
1 345
13.0%
2 289
10.9%
7 243
9.1%
4 237
8.9%
0 206
7.7%
5 203
7.6%
6 201
7.6%
9 172
6.5%
8 141
 
5.3%
Other values (3) 251
9.4%
ValueCountFrequency (%)
3 380
14.2%
1 340
12.7%
2 297
11.1%
7 259
9.7%
4 240
9.0%
6 210
7.9%
0 200
7.5%
5 179
6.7%
9 169
6.3%
8 138
 
5.2%
Other values (3) 262
9.8%
Latin
ValueCountFrequency (%)
C 73
22.7%
P 49
15.3%
A 43
13.4%
O 40
12.5%
S 35
10.9%
N 16
 
5.0%
T 14
 
4.4%
Q 7
 
2.2%
W 7
 
2.2%
I 7
 
2.2%
Other values (9) 30
9.3%
ValueCountFrequency (%)
C 71
21.1%
O 53
15.7%
P 42
12.5%
A 41
12.2%
S 38
11.3%
N 23
 
6.8%
T 20
 
5.9%
W 10
 
3.0%
I 7
 
2.1%
Q 7
 
2.1%
Other values (9) 25
 
7.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2982
100.0%
ValueCountFrequency (%)
ASCII 3011
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3 373
12.5%
1 345
11.6%
2 289
9.7%
7 243
8.1%
4 237
7.9%
0 206
 
6.9%
5 203
 
6.8%
6 201
 
6.7%
9 172
 
5.8%
8 141
 
4.7%
Other values (22) 572
19.2%
ValueCountFrequency (%)
3 380
12.6%
1 340
11.3%
2 297
9.9%
7 259
8.6%
4 240
8.0%
6 210
 
7.0%
0 200
 
6.6%
5 179
 
5.9%
9 169
 
5.6%
8 138
 
4.6%
Other values (22) 599
19.9%

Fare
Real number (ℝ)

 Dataset ADataset B
Distinct176178
Distinct (%)39.5%39.9%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean31.95553931.655884
 Dataset ADataset B
Minimum00
Maximum263263
Zeros610
Zeros (%)1.3%2.2%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2023-01-30T17:10:01.414119image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum00
5-th percentile7.2257.162525
Q17.9257.8958
median14.514.47915
Q331.27531.275
95-th percentile108.28125112.67708
Maximum263263
Range263263
Interquartile range (IQR)23.3523.3792

Descriptive statistics

 Dataset ADataset B
Standard deviation43.15728943.17571
Coefficient of variation (CV)1.35054171.3639079
Kurtosis12.33027611.1608
Mean31.95553931.655884
Median Absolute Deviation (MAD)7.00427.22915
Skewness3.2432373.1029131
Sum14252.17114118.524
Variance1862.55161864.1419
MonotonicityNot monotonicNot monotonic
2023-01-30T17:10:01.612658image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
8.05 22
 
4.9%
13 20
 
4.5%
7.8958 19
 
4.3%
7.75 18
 
4.0%
26 14
 
3.1%
10.5 12
 
2.7%
7.8542 9
 
2.0%
7.225 7
 
1.6%
7.925 7
 
1.6%
7.25 7
 
1.6%
Other values (166) 311
69.7%
ValueCountFrequency (%)
7.8958 24
 
5.4%
8.05 23
 
5.2%
26 19
 
4.3%
10.5 14
 
3.1%
7.75 14
 
3.1%
13 14
 
3.1%
7.925 11
 
2.5%
0 10
 
2.2%
7.225 9
 
2.0%
7.8542 8
 
1.8%
Other values (168) 300
67.3%
ValueCountFrequency (%)
0 6
1.3%
4.0125 1
 
0.2%
6.2375 1
 
0.2%
6.4375 1
 
0.2%
6.4958 1
 
0.2%
6.75 1
 
0.2%
6.8583 1
 
0.2%
6.975 1
 
0.2%
7.05 2
 
0.4%
7.0542 1
 
0.2%
ValueCountFrequency (%)
0 10
2.2%
6.4375 1
 
0.2%
6.45 1
 
0.2%
6.4958 1
 
0.2%
6.75 1
 
0.2%
6.8583 1
 
0.2%
6.95 1
 
0.2%
6.975 1
 
0.2%
7.05 2
 
0.4%
7.0542 1
 
0.2%
ValueCountFrequency (%)
0 10
2.2%
6.4375 1
 
0.2%
6.45 1
 
0.2%
6.4958 1
 
0.2%
6.75 1
 
0.2%
6.8583 1
 
0.2%
6.95 1
 
0.2%
6.975 1
 
0.2%
7.05 2
 
0.4%
7.0542 1
 
0.2%
ValueCountFrequency (%)
0 6
1.3%
4.0125 1
 
0.2%
6.2375 1
 
0.2%
6.4375 1
 
0.2%
6.4958 1
 
0.2%
6.75 1
 
0.2%
6.8583 1
 
0.2%
6.975 1
 
0.2%
7.05 2
 
0.4%
7.0542 1
 
0.2%

Cabin
Categorical

 Dataset ADataset B
Distinct9881
Distinct (%)86.7%80.2%
Missing333345
Missing (%)74.7%77.4%
Memory size7.0 KiB7.0 KiB
C23 C25 C27
 
4
G6
 
3
F33
 
3
C78
 
2
C124
 
2
Other values (93)
99 
F33
 
3
E25
 
2
E33
 
2
E67
 
2
C123
 
2
Other values (76)
90 

Length

 Dataset ADataset B
Max length1515
Median length33
Mean length3.49557523.6633663
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters395370
Distinct characters1918
Distinct categories33 ?
Distinct scripts22 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique8762 ?
Unique (%)77.0%61.4%

Sample

 Dataset ADataset B
1st rowC126G6
2nd rowD15A36
3rd rowC125B18
4th rowC23 C25 C27C68
5th rowE49C91

Common Values

ValueCountFrequency (%)
C23 C25 C27 4
 
0.9%
G6 3
 
0.7%
F33 3
 
0.7%
C78 2
 
0.4%
C124 2
 
0.4%
C65 2
 
0.4%
D 2
 
0.4%
E121 2
 
0.4%
C83 2
 
0.4%
C126 2
 
0.4%
Other values (88) 89
 
20.0%
(Missing) 333
74.7%
ValueCountFrequency (%)
F33 3
 
0.7%
E25 2
 
0.4%
E33 2
 
0.4%
E67 2
 
0.4%
C123 2
 
0.4%
C22 C26 2
 
0.4%
G6 2
 
0.4%
B58 B60 2
 
0.4%
D20 2
 
0.4%
B96 B98 2
 
0.4%
Other values (71) 80
 
17.9%
(Missing) 345
77.4%

Length

2023-01-30T17:10:01.770523image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A


Number of variable categories passes threshold (config.plot.cat_freq.max_unique)

Dataset B


Number of variable categories passes threshold (config.plot.cat_freq.max_unique)
ValueCountFrequency (%)
c23 4
 
3.1%
c27 4
 
3.1%
c25 4
 
3.1%
g6 3
 
2.3%
f33 3
 
2.3%
c78 2
 
1.6%
c124 2
 
1.6%
c65 2
 
1.6%
d 2
 
1.6%
e121 2
 
1.6%
Other values (98) 101
78.3%
ValueCountFrequency (%)
f33 3
 
2.5%
b20 2
 
1.7%
e25 2
 
1.7%
c92 2
 
1.7%
b66 2
 
1.7%
b63 2
 
1.7%
b59 2
 
1.7%
c93 2
 
1.7%
b18 2
 
1.7%
c68 2
 
1.7%
Other values (83) 98
82.4%

Most occurring characters

ValueCountFrequency (%)
C 44
11.1%
2 43
10.9%
1 33
 
8.4%
3 32
 
8.1%
B 26
 
6.6%
5 25
 
6.3%
6 24
 
6.1%
7 23
 
5.8%
4 22
 
5.6%
D 21
 
5.3%
Other values (9) 102
25.8%
ValueCountFrequency (%)
2 39
10.5%
B 38
 
10.3%
C 32
 
8.6%
3 31
 
8.4%
6 28
 
7.6%
1 24
 
6.5%
4 21
 
5.7%
5 20
 
5.4%
7 20
 
5.4%
8 20
 
5.4%
Other values (8) 97
26.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 250
63.3%
Uppercase Letter 129
32.7%
Space Separator 16
 
4.1%
ValueCountFrequency (%)
Decimal Number 233
63.0%
Uppercase Letter 119
32.2%
Space Separator 18
 
4.9%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
C 44
34.1%
B 26
20.2%
D 21
16.3%
E 18
14.0%
A 9
 
7.0%
F 6
 
4.7%
G 4
 
3.1%
T 1
 
0.8%
ValueCountFrequency (%)
B 38
31.9%
C 32
26.9%
D 20
16.8%
E 16
13.4%
F 6
 
5.0%
A 4
 
3.4%
G 3
 
2.5%
Decimal Number
ValueCountFrequency (%)
2 43
17.2%
1 33
13.2%
3 32
12.8%
5 25
10.0%
6 24
9.6%
7 23
9.2%
4 22
8.8%
8 20
8.0%
0 18
7.2%
9 10
 
4.0%
ValueCountFrequency (%)
2 39
16.7%
3 31
13.3%
6 28
12.0%
1 24
10.3%
4 21
9.0%
5 20
8.6%
7 20
8.6%
8 20
8.6%
9 16
6.9%
0 14
 
6.0%
Space Separator
ValueCountFrequency (%)
16
100.0%
ValueCountFrequency (%)
18
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 266
67.3%
Latin 129
32.7%
ValueCountFrequency (%)
Common 251
67.8%
Latin 119
32.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
C 44
34.1%
B 26
20.2%
D 21
16.3%
E 18
14.0%
A 9
 
7.0%
F 6
 
4.7%
G 4
 
3.1%
T 1
 
0.8%
ValueCountFrequency (%)
B 38
31.9%
C 32
26.9%
D 20
16.8%
E 16
13.4%
F 6
 
5.0%
A 4
 
3.4%
G 3
 
2.5%
Common
ValueCountFrequency (%)
2 43
16.2%
1 33
12.4%
3 32
12.0%
5 25
9.4%
6 24
9.0%
7 23
8.6%
4 22
8.3%
8 20
7.5%
0 18
6.8%
16
 
6.0%
ValueCountFrequency (%)
2 39
15.5%
3 31
12.4%
6 28
11.2%
1 24
9.6%
4 21
8.4%
5 20
8.0%
7 20
8.0%
8 20
8.0%
18
7.2%
9 16
6.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 395
100.0%
ValueCountFrequency (%)
ASCII 370
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
C 44
11.1%
2 43
10.9%
1 33
 
8.4%
3 32
 
8.1%
B 26
 
6.6%
5 25
 
6.3%
6 24
 
6.1%
7 23
 
5.8%
4 22
 
5.6%
D 21
 
5.3%
Other values (9) 102
25.8%
ValueCountFrequency (%)
2 39
10.5%
B 38
 
10.3%
C 32
 
8.6%
3 31
 
8.4%
6 28
 
7.6%
1 24
 
6.5%
4 21
 
5.7%
5 20
 
5.4%
7 20
 
5.4%
8 20
 
5.4%
Other values (8) 97
26.2%

Embarked
Categorical

 Dataset ADataset B
Distinct33
Distinct (%)0.7%0.7%
Missing11
Missing (%)0.2%0.2%
Memory size7.0 KiB7.0 KiB
S
321 
C
83 
Q
41 
S
334 
C
76 
Q
35 

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters445445
Distinct characters33
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st rowSS
2nd rowSS
3rd rowSS
4th rowSS
5th rowSS

Common Values

ValueCountFrequency (%)
S 321
72.0%
C 83
 
18.6%
Q 41
 
9.2%
(Missing) 1
 
0.2%
ValueCountFrequency (%)
S 334
74.9%
C 76
 
17.0%
Q 35
 
7.8%
(Missing) 1
 
0.2%

Length

2023-01-30T17:10:01.883545image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2023-01-30T17:10:02.000096image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset B

2023-01-30T17:10:02.117534image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
ValueCountFrequency (%)
s 321
72.1%
c 83
 
18.7%
q 41
 
9.2%
ValueCountFrequency (%)
s 334
75.1%
c 76
 
17.1%
q 35
 
7.9%

Most occurring characters

ValueCountFrequency (%)
S 321
72.1%
C 83
 
18.7%
Q 41
 
9.2%
ValueCountFrequency (%)
S 334
75.1%
C 76
 
17.1%
Q 35
 
7.9%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 445
100.0%
ValueCountFrequency (%)
Uppercase Letter 445
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
S 321
72.1%
C 83
 
18.7%
Q 41
 
9.2%
ValueCountFrequency (%)
S 334
75.1%
C 76
 
17.1%
Q 35
 
7.9%

Most occurring scripts

ValueCountFrequency (%)
Latin 445
100.0%
ValueCountFrequency (%)
Latin 445
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
S 321
72.1%
C 83
 
18.7%
Q 41
 
9.2%
ValueCountFrequency (%)
S 334
75.1%
C 76
 
17.1%
Q 35
 
7.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 445
100.0%
ValueCountFrequency (%)
ASCII 445
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
S 321
72.1%
C 83
 
18.7%
Q 41
 
9.2%
ValueCountFrequency (%)
S 334
75.1%
C 76
 
17.1%
Q 35
 
7.9%

Interactions

Dataset A

2023-01-30T17:09:53.390783image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset B

2023-01-30T17:09:57.407860image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset A

2023-01-30T17:09:51.117051image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset B

2023-01-30T17:09:55.088299image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset A

2023-01-30T17:09:51.677391image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset B

2023-01-30T17:09:55.623546image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset A

2023-01-30T17:09:52.252557image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset B

2023-01-30T17:09:56.182290image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset A

2023-01-30T17:09:52.873786image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset B

2023-01-30T17:09:56.739468image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset A

2023-01-30T17:09:53.489211image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset B

2023-01-30T17:09:57.539587image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset A

2023-01-30T17:09:51.232856image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset B

2023-01-30T17:09:55.194371image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset A

2023-01-30T17:09:51.783857image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset B

2023-01-30T17:09:55.732891image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset A

2023-01-30T17:09:52.349952image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset B

2023-01-30T17:09:56.286741image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset A

2023-01-30T17:09:52.968187image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset B

2023-01-30T17:09:56.836761image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset A

2023-01-30T17:09:53.601836image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset B

2023-01-30T17:09:57.654065image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset A

2023-01-30T17:09:51.351214image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset B

2023-01-30T17:09:55.306919image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset A

2023-01-30T17:09:51.913244image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset B

2023-01-30T17:09:55.853495image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset A

2023-01-30T17:09:52.549597image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset B

2023-01-30T17:09:56.401739image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset A

2023-01-30T17:09:53.083179image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset B

2023-01-30T17:09:56.961740image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset A

2023-01-30T17:09:53.729284image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset B

2023-01-30T17:09:57.804956image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset A

2023-01-30T17:09:51.478344image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset B

2023-01-30T17:09:55.419637image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset A

2023-01-30T17:09:52.024498image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset B

2023-01-30T17:09:55.965029image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset A

2023-01-30T17:09:52.672564image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset B

2023-01-30T17:09:56.525985image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset A

2023-01-30T17:09:53.197530image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset B

2023-01-30T17:09:57.085064image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset A

2023-01-30T17:09:53.861716image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset B

2023-01-30T17:09:57.959652image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset A

2023-01-30T17:09:51.578360image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset B

2023-01-30T17:09:55.519038image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset A

2023-01-30T17:09:52.141919image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset B

2023-01-30T17:09:56.072118image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset A

2023-01-30T17:09:52.769958image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset B

2023-01-30T17:09:56.634244image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset A

2023-01-30T17:09:53.292603image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset B

2023-01-30T17:09:57.184948image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Correlations

Dataset A

2023-01-30T17:10:02.208973image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset B

2023-01-30T17:10:02.383255image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/

Dataset A

PassengerIdAgeSibSpParchFareSurvivedPclassSexCabinEmbarked
PassengerId1.0000.044-0.0120.007-0.0350.1060.0000.0000.0890.038
Age0.0441.000-0.184-0.2340.1650.1750.2950.1060.2230.140
SibSp-0.012-0.1841.0000.4370.4420.1340.1250.1830.0000.066
Parch0.007-0.2340.4371.0000.4160.1210.0000.2680.0000.080
Fare-0.0350.1650.4420.4161.0000.2580.5320.2340.3800.230
Survived0.1060.1750.1340.1210.2581.0000.3240.5370.0000.122
Pclass0.0000.2950.1250.0000.5320.3241.0000.1090.3690.244
Sex0.0000.1060.1830.2680.2340.5370.1091.0000.0000.077
Cabin0.0890.2230.0000.0000.3800.0000.3690.0001.0000.356
Embarked0.0380.1400.0660.0800.2300.1220.2440.0770.3561.000

Dataset B

PassengerIdAgeSibSpParchFareSurvivedPclassSexCabinEmbarked
PassengerId1.0000.020-0.0550.019-0.0400.0980.0180.0920.0000.000
Age0.0201.000-0.202-0.2800.1200.1290.2920.0370.2690.000
SibSp-0.055-0.2021.0000.4250.4560.1780.1250.1780.4460.000
Parch0.019-0.2800.4251.0000.4710.2160.0720.3000.4190.022
Fare-0.0400.1200.4560.4711.0000.2710.5340.1870.4590.232
Survived0.0980.1290.1780.2160.2711.0000.3160.5140.2970.122
Pclass0.0180.2920.1250.0720.5340.3161.0000.1310.4520.230
Sex0.0920.0370.1780.3000.1870.5140.1311.0000.0000.000
Cabin0.0000.2690.4460.4190.4590.2970.4520.0001.0000.454
Embarked0.0000.0000.0000.0220.2320.1220.2300.0000.4541.000

Missing values

Dataset A

2023-01-30T17:09:54.033474image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
A simple visualization of nullity by column.

Dataset B

2023-01-30T17:09:58.141444image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
A simple visualization of nullity by column.

Dataset A

2023-01-30T17:09:54.282710image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Dataset B

2023-01-30T17:09:58.347629image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Dataset A

2023-01-30T17:09:54.437890image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Dataset B

2023-01-30T17:09:58.483789image/svg+xmlMatplotlib v3.6.3, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
42742812Phillips, Miss. Kate Florence ("Mrs Kate Louise Phillips Marshall")female19.00025065526.0000NaNS
86486502Gill, Mr. John Williammale24.00023386613.0000NaNS
40340403Hakkarainen, Mr. Pekka Pietarimale28.010STON/O2. 310127915.8500NaNS
2313Heikkinen, Miss. Lainafemale26.000STON/O2. 31012827.9250NaNS
16016103Cribb, Mr. John Hatfieldmale44.00137136216.1000NaNS
81181203Lester, Mr. Jamesmale39.000A/4 4887124.1500NaNS
73373402Berriman, Mr. William Johnmale23.0002842513.0000NaNS
42142203Charters, Mr. Davidmale21.000A/5. 130327.7333NaNQ
22122202Bracken, Mr. James Hmale27.00022036713.0000NaNS
13113203Coelho, Mr. Domingos Fernandeomale20.000SOTON/O.Q. 31013077.0500NaNS

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
67567603Edvardsson, Mr. Gustaf Hjalmarmale18.0003499127.7750NaNS
37637713Landergren, Miss. Aurora Adeliafemale22.000C 70777.2500NaNS
25125203Strom, Mrs. Wilhelm (Elna Matilda Persson)female29.01134705410.4625G6S
21922002Harris, Mr. Waltermale30.000W/C 1420810.5000NaNS
80680701Andrews, Mr. Thomas Jrmale39.0001120500.0000A36S
57557603Patchett, Mr. Georgemale19.00035858514.5000NaNS
54554601Nicholson, Mr. Arthur Ernestmale64.00069326.0000NaNS
17317403Sivola, Mr. Antti Wilhelmmale21.000STON/O 2. 31012807.9250NaNS
75075112Wells, Miss. Joanfemale4.0112910323.0000NaNS
35435503Yousif, Mr. WazlimaleNaN0026477.2250NaNC

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
666712Nye, Mrs. (Elizabeth Ramell)female29.0000C.A. 2939510.500F33S
34034112Navratil, Master. Edmond Rogermale2.001123008026.000F2S
49149203Windelov, Mr. Einarmale21.0000SOTON/OQ 31013177.250NaNS
40640703Widegren, Mr. Carl/Charles Petermale51.00003470647.750NaNS
373803Cann, Mr. Ernest Charlesmale21.0000A./5. 21528.050NaNS
57757811Silvey, Mrs. William Baird (Alice Munger)female39.00101350755.900E44S
83783803Sirota, Mr. MauricemaleNaN003920928.050NaNS
32032103Dennis, Mr. Samuelmale22.0000A/5 211727.250NaNS
75575612Hamalainen, Master. Viljomale0.671125064914.500NaNS
40040113Niskanen, Mr. Juhamale39.0000STON/O 2. 31012897.925NaNS

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
56957013Jonsson, Mr. Carlmale32.0003504177.8542NaNS
76476503Eklund, Mr. Hans Linusmale16.0003470747.7750NaNS
34334402Sedgwick, Mr. Charles Frederick Waddingtonmale25.00024436113.0000NaNS
78178211Dick, Mrs. Albert Adrian (Vera Gillespie)female17.0101747457.0000B20S
57958013Jussila, Mr. Eiriikmale32.000STON/O 2. 31012867.9250NaNS
73873903Ivanoff, Mr. KaniomaleNaN003492017.8958NaNS
81381403Andersson, Miss. Ebba Iris Alfridafemale6.04234708231.2750NaNS
61461503Brocklebank, Mr. William Alfredmale35.0003645128.0500NaNS
68468502Brown, Mr. Thomas William Solomonmale60.0112975039.0000NaNS
44945011Peuchen, Major. Arthur Godfreymale52.00011378630.5000C104S

Duplicate rows

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked# duplicates
Dataset does not contain duplicate rows.

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked# duplicates
Dataset does not contain duplicate rows.